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Many speech sounds and animal vocalizations contain components, referred to as complex 
tones, that consist of a fundamental frequency (F0) and higher harmonics. In this study we 
examined single-unit activity recorded in the core (A1 ) and lateral belt (LB) areas of auditory 
cortex in two rhesus monkeys as they listened to pure tones and pitch-shifted conspecific 
vocalizations ("coos"). The latter consisted of complex-tone segments in which F0 was 
matched to a corresponding pure-tone stimulus. In both animals, neuronal latencies to 
pure-tone stimuli at the best frequency (BF) were ~10 to 15 ms longer in LB than in A1. This 
might be expected, since LB is considered to be at a hierarchically higher level than A1. On 
the other hand, the latency of LB responses to coos was ~10 to 20 ms shorter than to the 
corresponding pure-tone BF, suggesting facilitation in LB by the harmonics. This latency 
reduction by coos was not observed in A1, resulting in similar coo latencies in A1 and 
LB. Multi-peaked neurons were present in both A1 and LB; however, harmonically-related 
peaks were observed in LB for both early and late response components, whereas in 
A1 they were observed only for late components. Our results suggest that harmonic 
features, such as relationships between specific frequency intervals of communication 
calls, are processed at relatively early stages of the auditory cortical pathway, but 
preferentially in LB. 
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INTRODUCTION 

Harmonics, one of the essential acoustic structures observed in 
a natural environment, consist of integer multiples of a sound's 
fundamental frequency (F0). Natural harmonic sounds include 
species-specific vocalizations, a sound category with biological 
relevance for most species including humans (Fitch, 2006), and 
most musical instrument sounds. Simultaneous presentations of 
tonal sounds are referred to as chords; if their harmonics have 
a simple frequency interval ratio of 2:1 ("octave") or 3:2 ("per- 
fect fifth"), they are perceived as consonant by both humans 
and nonhuman primates (Schellenberg and Trainor, 1996; Izumi, 
2000). By contrast, complex frequency interval ratios, as in a 
"minor second" (16:15), create roughness of sound and are per- 
ceived as dissonant. Studies have shown that human infants as 
young as 4 months have a preference for consonance over dis- 
sonance (Zentner and Kagan, 1996, 1998), suggesting a possible 
innate bias toward harmonic structure like that contained in 
communication sounds (but see Terhardt, 1974). 

We perceive harmonically related sounds as a whole rather 
than as components of a spectrum. Macaque monkeys, whose 
architectonic structure of cortical auditory regions closely resem- 
bles that in humans (Hackett et al., 2001), judge two melodies 
to be the same when they are transposed by one or two octaves, 
but only if the melodies are tonal (Wright et al., 2000). They 



also perceive the pitch of harmonic sounds with a "missing fun- 
damental" (Tomlinson, 1988), suggesting they experience gestalt 
perception of tonal structures just as we do. 

Neurophysiological and fMRI studies of primary auditory cor- 
tex (Al) in several species have reported neurons with multiple 
peaks of their response rates in the frequency domain, includ- 
ing ones at harmonically-related intervals (monkey: Brosch et al., 
1999; Kadia and Wang, 2003; cat: Oonishi and Katsuki, 1965; 
Sutter and Schreiner, 1991; Eggermont, 2007; Norena et al., 2008; 
human: Moerel et al., 2013). Meanwhile, neuronal populations in 
monkey Al show greater evoked responses to dissonant than to 
consonant chords (Fishman et al., 2001) and enhanced responses 
to mistuned harmonics compared to harmonics (Fishman and 
Steinschneider, 2010), perhaps due to the greater salience of the 
mistuned component. However, the neural basis of harmonic 
processing, especially outside primary auditory cortex, remains 
unclear. 

Generally, neurons in the lateral belt (LB), located laterally 
adjacent to the auditory core areas, respond preferentially to 
complex sounds, including band-passed noise and frequency- 
modulated sweeps (Rauschecker et al., 1995; Rauschecker and 
Tian, 2004; Tian and Rauschecker, 2004). Within LB, selectivity 
for conspecific calls is highest in its anterolateral division (AL) 
(Tian et al., 2001). In the present study, the stimulus preference 
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of auditory neurons for pure tones vs. harmonic vocalizations 
("coo") was compared in Al and LB [including the middle lateral 
(ML) and anterolateral (AL) areas of auditory cortex] in behaving 
rhesus monkeys. We hypothesized that spectral integration of har- 
monically related intervals takes place preferentially in LB. Due to 
this integration ("spectral combination sensitivity"; Suga et al., 
1979; Rauschecker et al., 1995) a sound with harmonic struc- 
ture should be more effective than a pure tone, even one at the 
neuron's BF, in evoking a response in this region. 

MATERIALS AND METHODS 
ANIMAL PREPARATIONS 

Two adult male rhesus monkeys (Macaca mulatto), weighing 
7.5-11.5 kg, were prepared for chronic awake electrophysiolog- 
ical recording. Animal care and all procedures were conducted 
in accordance with the National Institutes of Health guide- 
lines, and all experimental procedures were approved by the 
Georgetown University Animal Care and Use Committee. Each 
animal was anesthetized, and a head post and recording chamber 
were attached to the dorsal surface of the skull under asep- 
tic conditions. With guidance from MRI images obtained with 
a 3T scanner (0.5 mm voxel size, Siemens Tim Trio), a cylin- 
drical chamber (65° angle, 19 mm diameter, Crist Instruments, 
Hagerstown, MD) was positioned stereotaxically over the left 
hemisphere of Monkey H, and a custom-made oval chamber 
(20 x 40 mm, Crist Instruments) was positioned over the left 
hemisphere so as to cover most of the supratemporal plane 
of Monkey P. Monkey H had previously been used to acquire 
data from the rostral supratemporal plane through a rostrally 
positioned chamber (Kikuchi et al., 2010); therefore, for this 
experiment, the original chamber was removed and re-implanted 
over a more caudal auditory region to permit access to the mid- 
dle lateral (ML) and anterolateral (AL) auditory areas in addition 
to the auditory core cortex. A post-operative MRI scan confirmed 
that the chambers were positioned correctly. The skull disc within 
the chamber was then removed under aseptic conditions before 
recording was begun. While awake, Monkey H received audi- 
ological screening, which included DPOAE (distortion product 
otoacoustic emission) measurements to assess cochlear function, 
and tympanometry to evaluate middle-ear function. The hearing 
ability of Monkey H was found to be normal. 

BEHAVIORAL TASK 

Behavioral testing and recording sessions were conducted in a 
single-walled acoustic chamber (Industrial Acoustics Company, 
Bronx, NY) installed with foam isolation elements (AAP3, 
Acoustical Solutions). The animal sat in a monkey chair with its 
head fixed, facing a speaker located one meter directly in front 
of it in a darkened room. The animal was trained to perform 
an auditory discrimination task. A single positive stimulus (S+), 
consisting of a 300-ms pink- noise burst (PNB), was pseudo- 
randomly interspersed among negative stimuli (S— ) for 20% of 
the trials. The (S— ) consisted of all other stimuli. The animal 
initiated a trial by holding a lever for 500 ms, triggering the pre- 
sentation of one of the acoustic stimuli. Lever release within a 
500-ms response window after offset of the S+ led to a water 
reward (~0.2ml) followed by a 500-ms inter-trial interval (ITI). 



Lever release in response to a negative stimulus prolonged the 
500-ms ITI by 1 s (timeout). The average inter-onset-interval, 
including correct and incorrect trials, was 2.3 ± 0.45 s (mean ± 
SD). In this report, all electrophysiological analyses are based on 
correct trials only. 

SOUND PREPARATION 

The sound waveform signals were sent through a 12-bit D/A con- 
verter (CIO-DAS1602/12, ComputerBoards) using the CORTEX 
dual-computer system and then amplified, attenuated, and deliv- 
ered through a free-field loudspeaker (Reveal 6, Tannoy), which 
had a flat (±3 dB) frequency response from 63 Hz to 51 kHz. 

All stimuli, including the monkey vocalizations ("coo" calls), 
had a 300-ms fixed duration, gated with a 5-ms rise/fall linear 
ramp. This vocalization was recorded under natural conditions 
in Morgan Island using a directional microphone (ME66 with 
K6 powering module, Sennheiser, CT, USA, frequency response 
at 40-20,000 Hz ± 2.5 dB) with a solid-state portable recorder 
(PMD670, Marantz Professional, London, UK) at a sampling 
rate of 48 kHz (Laboratory of Neuropsychology, NIMH). The 
vocalization consisted of harmonic structures with asymmetri- 
cal spectral contours (Figure 1). Pure tones (PTs) and PNBs were 
generated at a sampling rate of 48 kHz (32 bit) using Adobe 
Audition 1.5. The stimuli were normalized by recording the stim- 
uli played through the stimulus presentation system, filtering the 
recorded signal on the basis of Japanese macaque audiograms 
(Jackson et al., 1999), and using the maximum root-mean-square 
(RMS) amplitude during a sliding window of 200 ms duration 
and presented at ~70 dB SPL. Details of the sound equalization 
method were described by Kusmierek and Rauschecker (2009). 

STIMULI 

The experiment consisted of several blocks of sessions, each block 
with a different stimulus set. After isolating a neuron, we deter- 
mined the neuron's best frequency (BF) and receptive field (RF), 
using 23 PTs ranging from 134 Hz (C3) to 21 kHz (E10) at fifth 
and tritone intervals in a diatonic scale. This yielded a rough tun- 
ing curve and/or 84 PTs at semitone steps in a chromatic scale 
with a range of 7 octaves between 110 Hz (A2) and 13.3 kHz (G#9) 
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FIGURE 1 | (A) Spectrogram and (B) power spectrum of the coo call used 
as a naturalistic harmonic sound in this study. The coo consists of the first 
harmonic or fundamental (hi) and five prominent harmonics with 
amplitudes above -50 dB (2nd to 6th harmonics, h2-h6; FFT size: 1024, 
Hann window). The interval between hi and h2 is one octave and that 
between h2 and h3 is a perfect fifth. 
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to obtain a fine tuning curve. We then used a set of PTs and coos 
with various pitches using digital recordings of natural coo calls 
to test responses to complex tones (see Figure 1, Supplementary 
material 1). The fundamental frequency (FO) of the coo was var- 
ied using the pitch-shift function in Adobe Audition 1.5. Neural 
responses to PT and coo stimuli were compared using a stimulus 
set comprised of 10 PTs and 10 pitch-matched coo calls in either 
the same block (76 sessions) or in separate blocks (48 sessions). 
The frequency of PTs and the FO of the coos ranged from G3 
(196 Hz) to C#8 (4435 Hz) in 6 semitone steps. In each record- 
ing session, the stimuli were presented in pseudorandom order 
with at least 15 trials per stimulus. 

ELECTROPHYSIOLOGICAL RECORDINGS 

Prior to each recording session, a reference point above the lat- 
eral sulcus was calculated based on the preoperative MRI scan. 
The position of the supratemporal plane was calculated based on 
the MRI images, and coordinates were mapped onto the cham- 
ber. A guide tube for up to 4 tungsten microelectrodes (0.5-3.0 
M£2, epoxylite insulation, FHC, Bowdoin, ME) was then low- 
ered into the brain to this reference point. Each electrode was 
independently advanced using a remote-controlled hydraulic, 4- 
channel customized microstep-multidrive system (NAN-SYS-4, 
Plexon. Inc., Dallas, TX). As the electrode was lowered, a silent 
gap in the recording signal was usually observed as the electrode 
passed between the frontal and the temporal lobe. The post- 
gap depth at which the first robust spontaneous spiking activity 
was observed was marked as the initial recording site for each 
electrode; it is thus likely that much of the data were recorded 
from the supragranular layers. In addition to the coordinates 
of the recording site, which was perpendicular to the chamber 
grid plane, the coordinates of three additional points on the grid 
plane were determined as reference points. Thus, four reference 
points were available, as needed, to reconstruct the coordinates 
of each recording site in 3D space from individual anatomical 
MRI images (voxel size 1 mm) after each recording session. This 
becomes important for the standardization of the coordinates in 
relation to auditory areas using a population-average brain (see 
below for more details). 

We attempted to select neurons based neither on their stim- 
ulus preference nor on the shape of their spiking activity. The 
signal from each electrode was passed through a head stage with 
gain one and high input impedance (HST/8o50-Gl, Plexon Inc.) 
and then split to extract the spiking activity through a pream- 
plifier system (PBX2/16sp/16fp, Plexon Inc.). The spike signals 
were filtered with a pass-band of 150-8000 Hz, further amplified, 
and then digitized at 40 kHz. Voltage-thresholding was applied to 
spiking activity, and spike waveforms were stored after threshold 
crossing. In many cases, the signal on each electrode contained 
activity from more than one neuron. After time-voltage thresh- 
olding, we separated the multi-unit spike trains into single-unit 
spike trains using the Valley Seeking algorithm (Offline Sorter, 
Plexon, Inc.). When we found more than one cluster, the sepa- 
ration quality of multi-clusters was inspected using Multivariate 
Analysis of Variance (MANOVA), and only data with p < 0.01 
were considered to be separate neurons from the same elec- 
trodes and included as such in the analysis. During long recording 



sessions, temporal stability was sometimes lost due to electrode 
drift. Such instability appeared as a discontinuous cluster in time. 
These units were excluded from analysis. 

We also inspected the inter-spike interval (ISI) for each cluster. 
An ISI distribution with entries smaller than the refractory period 
( 1 ms) signifies that the recorded spikes were from more than a 
single neuron. In most such cases, we changed the threshold at 
the stage of voltage-thresholding and re-ran the cluster analysis. 
However, if an ISI < 1 ms was still observed in a small proportion 
of a newly sorted ISI distribution (usually <0.3%), the furthest 
spike waveform from the cluster center in 2D feature space with 
ISIs less than the refractory ISI was removed from the unit. 

Time stamps indicating the timing of auditory stimulus, 
behavioral response, and reward events were sent through 
CORTEX (CIO-DAS1602/12, CIO-DI024, ComputerBoards), 
and continuous data, such as sound waveforms and eye move- 
ments monitored by an infrared-based eye-tracking system at 
60 Hz (ETL-200, ISCAN, Inc.), used to check the animals' state of 
wakefulness, were sent to a Multichannel Acquisition Processor 
system (MAP, Plexon, Inc.) and then integrated with the spike 
data. During the recording session, spikes were roughly sorted 
by real-time acquisition programs using template matching and 
PCA clustering methods (RASPUTIN, Plexon), and rough esti- 
mations of the frequency- and intensity- tuning profiles of the 
neuron were examined online (Neuroexplorer, Nex Technologies, 
MA). Throughout the recording sessions, we monitored neuronal 
activity visually with an oscilloscope (HM407-2, HAMEG) and 
aurally through headphones (HD 280 Professional, Sennheiser). 
Data selection, pre-processing, and data analysis were performed 
using MATLAB and SPSS. All the results in this report are based 
on offline analysis conducted after the experiments were com- 
pleted; the online analysis was used only as a quick evaluation of 
a neuron's characteristics for stimulus selection purposes. 

DATA ANALYSIS 

The spike trains of single-unit activity (SUA) were binned at 
1 ms for each trial and the average spontaneous firing rate and 
its variability per stimulus condition was first calculated during 
the baseline period (0-150 ms before sound onset). The spike 
trains were convolved with a Gaussian kernel (a = 10) to con- 
struct spike-density peri-stimulus time histograms (PSTHs) and 
then normalized to the average variability (SD) of the raw baseline 
firing rate across all stimulus conditions. Neurons that showed 
responses 2.0 SDs above baseline for 10 consecutive 1-ms sam- 
pling points in the normalized PSTH to at least one sound (other 
than the S+, pink noise) were defined as "auditory-responsive." 
These constraints were imposed in order to exclude spurious 
activity or artifacts. 

Tuning curves were constructed based on peak response mag- 
nitude (i.e., the maximum magnitude of the peak firing rate 
minus the average baseline firing rate) and then smoothed by 
moving the average along with the two neighboring points (i.e., 
two semitones) on each side of the frequency axis. The frequency 
that produced the maximum response on the tuning curve func- 
tion was defined as the best frequency (BF) for the neuron. 
Neurons were classified as having either one peak on the fre- 
quency tuning curve function (i.e., a single-peaked neuron) or 
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more than one peak (i.e., a multi-peaked neuron) with a clear 
excitation greater than 2.5 SDs above baseline firing rates and the 
half driven rate of the peak (i.e., 50% firing rates of the normalized 
peak magnitude in response to the BF). To obtain a clear tuning 
peak, we used this stricter criterion than the one defined above 
for auditory-responsive neurons. 

Neuronal latency was calculated based on the spike density 
function with the Gaussian kernel (a = 10) described above and 
was defined as the time from sound onset to the first millisec- 
ond bin in which spiking activity rose 2 SDs above baseline for 
10 consecutive 1-ms bins. The SD calculated from the raw data 
(taking the grand average of variability across all stimulus condi- 
tions) generally yielded a higher value than that calculated from 
smoothed data. Minimum latency was defined as the shortest 
latency across all auditory responses of the neuron; this was some- 
times different from the latency in response to the BF, which 
was measured from stimulus onset to the peak magnitude of 
the response. Minimum latency to S+ was calculated only when 
the neuron showed a significant response to S+ in both correct 
and incorrect trials. For incorrect trials, the recording session 
was included in the statistical analysis only if there were at least 
five "miss" trials (without a response to S+) within that ses- 
sion. If there were other types of errors (e.g., premature response 
to the positive stimulus), the trials were excluded altogether to 
avoid incorporating artifactual effects on neuronal activity, e.g., 
effects of motor responses. BF latency was defined as the latency 
in response to the BF, and the average latency was defined as 
the median latency across all auditory responses of the neuron. 
The coo-call latency was defined as the latency in response to 
the coo whose F0 was matched to the neuron's BF. Neuronal 
latencies and spike rates across subdivisions of auditory cor- 
tex were compared using the Kruskal-Wallis test, and post-hoc 
testing between subfields was performed using Tukey's "hon- 
estly significant difference" (HSD) test to correct for multiple 
comparisons. 

To analyze the tuning width of each neuron quantitatively, 
a bandwidth index (BI) was calculated by a method similar to 
one used by Lakatos et al. (2005; formula shown in Figure 4A) 
using a fine-tuning paradigm. The BI was calculated using nor- 
malized firing rates during the entire sound duration (0-300 ms) 
after subtracting mean baseline firing rates across all stimuli in the 
single-peaked neurons (Figure 4A). A BI index close to 1 indicates 
sharp frequency tuning, whereas a BI index near 0 indicates broad 
tuning. We also measured the traditional tuning width of the 
neuron's response peak at 30 dB above threshold (BW30; Sutter 
and Schreiner, 1991; Schreiner and Sutter, 1992). Pure tones with 
sound durations of 300 ms at five different intensities in 10-dB 
steps (30-80 dB SPL) were presented at different frequencies (E3- 
E10, 165-21 kHz) in octave steps. The tones were played in a 
pseudorandom order of different frequencies and intensities. The 
neuron's frequency response area (FRA) was determined as a con- 
tour line of 2.5 SD above baseline activity in the frequency and 
intensity domains, and the tuning width at an intensity of 30 dB 
SPL above the neuron's threshold in the FRA was determined 
as the neuron's BW30. To make the bandwidth results directly 
comparable between our study and Sutter and Schreiner's study, 
we computed the BW 30 using both single- and multi-peaked 



neurons. Since we used a fixed stimulus set (40 tones), we were 
not always able to precisely determine the neuron's threshold, 
because some neurons still showed a response at the lowest sound 
intensity we employed. In this case we calculated the BW30 as the 
tuning width at 60 dB SPL, which is 30 dB above the lowest sound 
intensity we used. For the same reasons, if the neuron's threshold 
was as high as 60 dB SPL, we were not able to obtain the BW30. 
Also, due to time constraints, we were not always able to fully 
determine the neuron's FRA after completing the other tests; thus 
our analysis for BW30 is limited to the neurons we actually tested 
in this paradigm. 

Multi-peaked neurons were tested with a fine-tuning paradigm 
that included 84 pure tones in chromatic scales (A2-G#9, 1 10— 
13289.8 Hz). We selected the best two (if there were only two) 
or three peaks (>2.5 SDs above baseline and the half-driven rate 
of the peak) and assigned them to BF1-BF3 in ascending order 
of their frequency at the peaks (i.e., the lowest peak frequency 
was assigned to BF1). Among these frequencies, the frequency 
that elicited the greatest peak response was selected as the neu- 
ron's overall BF. The criterion for the presence of a peak was a 
response above a specific threshold; the criterion for two peaks 
was a decrease in response below this threshold for at least one 
point between the two peaks, the minimum inter-peak interval 
being separation by more than two semitones. The frequency 
interval ratio (BF ratio) was calculated for all three combina- 
tions (BF1-BF2, BF1-BF3, and BF2-BF3) and normalized by the 
frequency of the lowest peak, a method similar to that used by 
Kadia and Wang (2003). The distribution of BF ratios was then 
binned by one tenth of an octave (Sutter and Schreiner, 1991), 
which is wider than semitone resolution. The distribution of BF 
ratios was calculated based on the peak firing rates during the 
early-response period (0-70 ms from sound onset) and during 
the late-response period (71-300 ms from sound onset). The 70- 
ms time window was used to separate the onset and sustained 
components of the response, since a typical auditory single-unit 
response showed a trough between onset and sustained responses 
at approximately 60-80 ms. The confidence interval (CI) was cal- 
culated from the distribution of BF ratios using the same bin 
width. We also calculated the CI from the distribution of BF ratios 
under the assumption that the peak interval relations of multi- 
peaked neurons were random. The number of occurrences of 
peak intervals was assigned to a given bin of BF ratio and the aver- 
aged distribution after 1000 permutations was computed. Since 
the CI from the average distribution was always lower than the 
former CI using the raw distribution, we employed the CI cal- 
culated from the raw distribution in this study. To compare the 
number of harmonic intervals in multi-peaked neurons, we used 
one or two bins that were centered at the perfect fifth (1.5) and 
octave (2.0). When the BF ratio was in the middle of two bins, we 
used the bin with maximum peak. 

Data from the subfields of LB (i.e., ML and AL) were grouped 
whenever the sample size for individual subfields was too small to 
allow for statistical testing. 

ASSIGNMENT OF RECORDING LOCATIONS TO CORTICAL AREAS 

The recording sites in this study were assigned to either the audi- 
tory core region (primary auditory cortex, Al) or to the auditory 
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LB region [middle lateral field (ML) and anterolateral field (AL)] 
using the following criteria. 

To reconstruct the boundaries between cortical areas along the 
anterior-posterior (AP) axis, in particular the boundary between 
ML and AL, we employed the standard approach of using the 
cortical tonotopic gradient map based on the neurons' best fre- 
quency (BF, Figure 2B; cf. Rauschecker et al., 1995). The mean 
BF along the AP axis was used to calculate the reversal point of 
the BF tuning curve along the AP axis (monkey H, 15.5; monkey 
P, 14.5, Figure 2A). 



To reconstruct the boundaries between cortical areas along the 
medial-lateral (M-L) axis, i.e., between the putative core and LB 
regions, a similarly precise approach based on functional criteria 
cannot be taken, even though the neurons' response characteris- 
tics, in particular bandwidth tuning or tone-vs.-bandpass-noise 
preference, do differ between core and LB (Rauschecker et al., 
1995). Therefore, we used an approach based on population- 
average Tl-weighted MRI images (112RM-SL) provided by 
McLaren et al. (2009, 2010; the standardized atlas of the rhesus 
macaque brain can be downloaded from: www.brainmap.wisc. 
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FIGURE 2 1 (A) Recording sites: the location of A1 and lateral belt 
(LB) are depicted on the left supratemporal plane (STP) of Monkeys 
H and R The anterior-posterior (AP) and medial-lateral (ML) 
coordinates were transformed into standardized coordinates based on 
the population-average macaque brain (McLaren et al., 2009, 2010). 
The curved solid black line on each map shows the estimated 
border between core and LB based on the atlas from a single 
subject (Saleem and Logothetis, 2007). The anterior-posterior border 
(dotted line) was drawn from the frequency reversal observed on 
mapping the best frequencies (BFs, see Figure 2B); this reversal 
occurred at a slightly different AP coordinate in the two monkeys 



(Monkey H, 15.5; Monkey P 14.5). Shown on the right are two 
coronal MRI images of monkey P at the indicated AP levels, with 
A1 and LB on the supratemporal plane (STP) highlighted in red and 
blue, respectively. (B) Best frequency (BF) maps for each of the 
two animals. Frequency reversal on the BF maps was used to 
determine the anterior-posterior boundary of the auditory subdivisions 
for each animal. The dotted line was calculated based on the lowest 
frequency reversal point using mean values, smoothed by a 3-mm 
sliding window, along the AP direction. The scale bar on the right 
shows the frequency range of pure tones used to estimate the 
neurons' BF. 
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edu/monkey.html). This approach permits us to standardize elec- 
trophysiologically identified regions into a common space across 
multiple individuals and to assign the xyz-coordinate of each 
recording site to one of these regions. Most importantly for our 
study, these coordinates can be used to identify the medial-lateral 
boundary between core and LB. 

The 112RM-SL database is the average of 112 rhesus macaque 
brains co-registered with the single-subject atlas (D99-SL) of 
Saleem and Logothetis (2007), which was itself co-registered with 
histological slices (Nissl, parvalbumin, SMI-32, calbindin and cal- 
retinin) aligned to cytoarchitectonic areas. We used the Analysis 
of Functional Neurolmages (AFNI) software (Cox, 1996) for 
MRI processing. The volumes were reconstructed from the orig- 
inal Tl -weighted image of an individual animal in AFNI using 
the to3d function. To mask out nonbrain areas occupied by a 
thick mass of muscle around the skull, the skull was removed 
from the images using the 3sSkullStrip function, and the skull- 
stripped images were then aligned to the population-average 
brain (112RM-SL) to generate a transformation matrix that con- 
verted each xyz-coordinate of the recording site into standardized 
space. Since inter-individual variability is large, especially across 
the width of the supratemporal plane (STP), each co-registered 
brain yielded discrepancies for some brain structures, especially 
when the single-subject atlas was the reference for each site. In 
that case, we chose the gray matter closest to the site and visually 
assigned the cortical region based on the atlas. 

RESULTS 

The animals performed an auditory discrimination task on aver- 
age at 96.3% accuracy (86.4% correct responses for the S+ trials 
and 98.8 % correct for the S— trials). Most errors (1.8%) were 
failures to release the lever to the S+ ("miss" errors); the other 
types of errors were either premature responses to the S+ [lever 
release before sound offset (0.9%)] or "false-alarm" errors [lever 
release to a negative stimulus (0.9%)]. 

Neurons in Al and LB were recorded either separately or, 
more often, simultaneously using two to three electrodes. 
The spontaneous firing rates showed no significant differ- 
ences across the three divisions of auditory cortex (monkey 
H: Al, 11.7±8.6spikes/s; ML: 15.5 ± 10.9 spikes/s; AL: 
11.2 ± 8.1 spikes/s, p = 0.07; monkey P: Al, 14.0 ± 10.7 spikes/s; 
ML: 16.9 ± 12.4 spikes/s; AL: 15.0 spikes/s ± 9.8 spikes/s, 
p = 0.18, Kruskal-Wallis test, mean ± SD). 

HIERARCHICAL PROCESSING IN THREE SUBDIVISIONS OF AUDITORY 
CORTEX (Al. ML, AND AL) IN RESPONSE TO PURE TONES (PT) AND 
PINK-NOISE BURSTS (PNB) 

We first analyzed the responses of 596 single neurons (Al, 238; 
ML, 167; AL, 191) to pure tones (PTs). All three subfields of 
the auditory cortex generally responded to the PTs across a 
wide range of frequencies. The proportion of auditory neurons 
that showed a significant response to PTs decreased gradually 
from Al to ML to AL (Al, 79%; ML, 74%; AL, 67%; see 
Table 1 and Figure 3 A). Although there was no overall statisti- 
cally significant effect of PT responsiveness across subdivisions 
(chi-square test, x 2 = 2.1, df = 2, p > 0.05), minimum onset 
latencies to PTs, i.e., the shortest latencies among all the responses 



Table 1 | Population of auditory neurons driven by pure tones (PT) in 
different subfields of auditory cortex. 



Subject 


A1 % 


ML % 


AL % 


Total 


Monkey H 


74/96 (77) 


58/75 (77) 


99/148 (67) 


231/319 


Monkey P 


114/142 (80) 


66/92 (72) 


29/43 (67) 


209/277 


Total 


188/238 (79) 


124/167 (74) 


128/191 (67) 


440/596 



of a neuron, differed significantly across the three subdivisions 
(p < 0.001, Kruskal-Wallis test), being shortest in Al [median: 
28 ms, 25th percentile (Qi) = 17 ms, 75th percentile (Q3) = 
42 ms, N = 188] followed by ML and AL (ML: median: 35 ms, 
Ql = 26ms, Q3 = 50ms, N = 124, p < 0.001; AL: median: 
44ms, Ql = 22ms, Q 3 = 78ms, N = 128, p < 0.001, Tukey's 
HSD test, Figure 3B, Table 2). Minimum latencies to the PNBs 
on correct trials were also compared across the three subdivi- 
sions: Like the gradual change in minimum latency observed in 
response to the PTs (Figure 3C, left), the latency to the PNBs 
differed significantly across the subdivisions (Figure 3C, right, 
p < 0.001, Kruskal-Wallis test). The post-hoc tests show that the 
median latency in Al and ML differed significantly from that in 
AL, though the Al and ML latencies did not differ from each 
other (Al: median: 36 ms, Ql = 25 ms, Q3 = 47 ms; ML: median: 
39 ms, Ql = 3 1 ms, Q3 = 49 ms; AL, median: 5 1 ms, Ql = 38 ms, 
Q3 = 74 ms, Al vs. AL: p < 0.001; ML vs. AL: p < 0.05, Tukey's 
HSD test). To understand the variability between the two mon- 
keys and three subdivisions of the auditory cortex, "monkey" 
and "area" were included as between-subject condition factors in 
Two-Way ANOVAs. The analysis revealed that for PT minimum 
latencies, there was a significant main effect of area [_F( 2 . 434) = 
4.929, p < 0.01] and monkey [f ( i, 434 ) = 12.134, p < 0.01] but 
no interaction [P(2, 434) = 1.867, p = 0.16]. For PNB latencies, 
there was a significant main effect of area [F( 2 , 209) = 4.375, p < 
0.02] but not monkey [F {1 , 209) = 0.3495, p = 0.56] and there 
was no interaction [F(2, 209) = 0.14, p = 0.87]. The main effect 
of area was present for both PT and PNB latencies. Together, 
these data suggest that sound processing occurs along a cortical 
hierarchy from Al to ML and AL. 

We also compared the electrophysiological responses to the 
S+ (PNB) during correct trials (17.3% of all trials) and incorrect 
("miss") trials (1.8% of all trials; see Methods). The minimum 
response latency for correct and incorrect trials during the same 
recording session did not differ significantly in either Al or LB (Al: 
median: 35 vs. 37 ms, Ql :22 vs. 21 ms; Q3:47 vs. 47 ms, respectively, 
N = 54,p = 0.20; LB: median: 46 vs. 45 ms, Ql: 34 vs. 33 ms, Q3: 
62 vs. 66 ms, respectively, N = 40, p = 0.41, Wilcoxon signed-rank 
test). The latencies were generally longer in LB than in Al for both 
trial types (correct trials, p < 0.001; incorrect trials: p < 0.05), as 
expected from the responses to PTs and PNBs, consistent with the 
notion of a cortical hierarchy from Al to LB. 

MULTI-PEAKED NEURONS WITH HARMONICALLY RELATED INTERVALS 
INA1 AND LB 

We constructed a frequency tuning curve using the neuron's peak 
magnitude to calculate the best frequency (BF) of each audi- 
tory neuron. Among 205 neurons recorded in the fine-tuning 
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N = 440 




C PT PNB 




A1 ML AL A1 ML AL 

FIGURE 3 | (A) Proportion of auditory neurons driven by pure tones in Al , 
ML, and AL. The proportion was greatest in Al , followed by ML and AL. (B) 
Cumulative proportion of minimum response latency to pure tones. 
Latencies in LB (i.e., ML and AL) were significantly longer than in A1 
[median: A1: 28ms, N= 188; ML: 35ms, W= 124; AL: 44ms, N= 128; A1 
vs. ML, p < 0.001; A1 vs. AL, p < 0.001 (Tukey's HSD test)]. (C) Minimum 
response latency to pure tones (PT) (left) and pink-noise bursts (PNB) (right) 
across the three subdivisions of auditory cortex. The central marks of the 
boxplots show the median latency between the 25th and 75th percentiles. 
The asterisks denote the significance level of post-hoc testing (Tukey's HSD 
test, *p < 0.05, ***p < 0.001). 



pure-tone paradigms using chromatic scales with semitone steps, 
142 neurons (69%) were single-peaked, and 63 neurons (31%) 
were multi-peaked. The proportion of multipeaked neurons in 
Al and LB did not differ significantly in Monkey P (Al vs. LB: 47 
vs. 37%, Al: N = 17; LB: N = 10; x 2 = 0.65, df=l,p> 0.05) 
but it decreased in Monkey H (35 vs. 17%, Al: N = 16; LB: 
N = 16; x 2 = 5.85, df = 1, p < 0.05). The distribution of BFs 
for multi-peaked neurons (N = 63) was not significantly differ- 
ent from that for single-peaked neurons (N = 142), when the 
BFs eliciting the neurons' maximum peak response were com- 
pared (Monkey H: mean ± SE, 2826 ± 619 Hz vs. 3625 ± 430 Hz, 
p = 0.63; Monkey P: 3244 ± 779 Hz vs. 2560 ± 614 Hz, p = 0.28, 
multi-peaked vs. single-peaked, Wilcoxon rank-sum test). 

We next analyzed the sharpness of frequency tuning in Al 
and LB using a bandwidth index (BI; see Methods) similar to 
the one used by Lakatos et al. (2005), with a BI close to 1 
indicating sharp frequency tuning, and a BI near 0 indicat- 
ing broad tuning. There was a main effect of recording site 
on BI (Al: 0.52 ± 0.02, N = 47; ML: 0.52 ± 0.02, N = 36; AL: 
0.46 ± 0.01, N = 59 (mean ± SE), Kruskal-Wallis test, p < 0.02, 
Figure 4A) with Al neurons displaying sharper tuning compared 
to AL (Tukey's HSD test, p < 0.02). Tuning width was further 
examined in a subset of neurons with the traditional approach 
measuring BW30 (Sutter and Schreiner, 1991; see Methods). 
This analysis had a similar outcome with a significant difference 
in frequency tuning width between Al and AL (Al: 1.9 ± 0.21 



octaves, N = 29; ML: 2.9 ± 0.66 octaves, N = 19; AL: 3.0 ± 0.51 
octaves, N = 14, mean ± SE, p < 0.05, Wilcoxon rank sum test, 
Figure 4B). 

Figures 5A,B show an example of a single neuron from area 
ML with a multi-peaked response. Whereas the tuning of the 
excitatory onset response was broad across a wide range of fre- 
quencies, several discrete frequency peaks can be distinguished 
in the sustained response after the drop-off of the initial onset 
response (see raster plots in Figure 5A). This neuron's BF (fre- 
quency with highest peak response; 112.8 spikes/s) was 1865 Hz 
(A#6). However, the sustained response showed three additional 
peaks above threshold, which were all distinct in frequency 
(440 Hz = A4, 622 Hz = D#5, and 932 Hz = A#5). The best three 
peaks were chosen based on the peak firing rates and assigned as 
BF1 (D#5, 59.4 spikes/s), BF2 (A#5, 60.6 spikes/s), and BF3 (A#6, 
112.8 spikes/s) in order of ascending frequency (see Methods). 
The frequency ratios of the best three BFs in relation to each other 
were 3.0 (BF3/BF1, 19 semitones), 1.5 (BF2/BF1, 7 semitones), 
and 2.0 (BF3/BF2, 12 semitones), which correspond to "perfect" 
harmonic or musical intervals (i.e., ratios of 2.0 = octave; and 
ratios of 1.5 and 3.0 = "perfect fifths"). 

If LB contributes more than Al to the spectral integration 
of harmonically- related interval information, the distribution of 
distances between two peaks of multi-peaked neurons might tend 
toward harmonically- related interval ratios more often in LB than 
in Al. We calculated the interval ratio between best frequen- 
cies (BF ratio) in all multi-peaked neurons (Figure 5C). This was 
done separately for early (0-70 ms from sound onset) and late 
responses (>70 ms). The distribution of BF ratios in LB showed a 
maximum at the perfect-fifth interval (3:2 = 1.5) in both the early 
and late periods (above the confidence interval (CI) at 99.9%) 
and at the octave (2:1 = 2.0, above the CI at 99.5%) for late 
periods, whereas BF ratios in Al showed a peak at the perfect- 
fifth interval only in the distribution of late responses (above the 
CI at 99.9%). A significant difference in the distribution of peak 
distances was found between Al and LB for early (Al: N = 72 
intervals measured, LB: N = 28, p < 0.01, Wilcoxon signed rank 
test) but not for late responses, when the same bin-by-bin paired 
comparison was performed (Al: N = 65 intervals, LB: N = 52 
intervals, p = 0.60). The proportion of harmonic intervals in the 
early period was significantly greater in LB than in Al, and the 
different bin widths did not affect the results (bin width = 2: 39 
vs. 15%, x 2 = 5.28, df = 1, p < 0.025; bin width = 1, 25 vs. 4%, 
X 2 = 8.75, p < 0.005, x 2 test). 

RESPONSE TO PITCH-SHIFTED COOS 

If, as hypothesized, the spectral integration of harmonically 
related frequencies takes place in LB, a sound with harmonic 
structure should be more effective in evoking a response in this 
area than would a pure tone, even one at the BF. To test this 
hypothesis, we shifted the pitch of a coo call to match the neu- 
ron's BF and compared the responses between Al and LB. A coo 
call was used because a previous study showed that LB neurons 
can be driven quite selectively by species-specific vocalizations 
(Tian et al., 2001). Auditory responses were sometimes elicited 
by a coo with the same pitch as a low tone sharing the same 
F0, particularly in neurons responsive to low frequencies, even if 
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Table 2 | Minimum latencies to pure tones (PT) and pink noise bursts (PNB) in different subfields of auditory cortex. 



Stimulus type 


Subject 




A1 


ML 


AL 




Tukey's HSD test 
















Al vs. ML 


Al vs. AL 


n n 1 . A ■ 

ML vs. AL 


PT 


Monkey H 


Q1 (25%) 


17 


28 


23 


p < 0.01 


p < 0.001 


NS 






Q2 (50%) 


30 


40 


45 












Q3 (75%) 


42 


61 


91 












N 


74 


58 


99 










Monkey P 


Q1 (25%) 


16 


19 


19 


p < 0.05 


NS 


NS 






Q2 (50%) 


27 


31 


41 












Q3 (75%) 


38 


45 


57 












N 


114 


66 


29 










Total 


Q1 (25%) 


17 


26 


22 


p < 0.001 


p < 0.001 


NS 






Q2 (50%) 


28 


35 


44 












Q3 (75%) 


42 


50 


78 












N 


188 


124 


128 








PNB 


Monkey H 


Q1 (25%) 


25 


41 


39 


NS 


p < 0.001 


NS 






Q2 (50%) 


32 


42 


53 












Q3 (75%) 


42 


46 


74 












N 


52 


4 


29 










Monkey P 


Q1 (25%) 


26 


29 


37 


p < 0.05 


NS 


NS 






\DU /0 1 


o / 


oo 


43 












Q3 (75%) 


48 


50 


73 












N 


72 


45 


13 










Total 


Q1 (25%) 


25 


31 


38 


NS 


p < 0.001 


p < 0.05 






Q2 (50%) 


36 


39 


51 












Q3 (75%) 


47 


49 


74 












N 


124 


49 


42 









(FR to BF - Mean FR to non-BFs) 
FR to BF 




FIGURE 4 | Tuning width in Al, ML, and AL measured using (A) a 
bandwidth index (Bl) and (B) a traditional BW30 index. Bl was 

calculated based on the formula shown using the fine-tuning paradigm (see 
Methods). There was a main effect of recording site on Bl (Kruskal-Wallis 
test: p < 0.02) with significantly sharper tuning in Al compared to that in 
AL (Tukey's HSD test, p < 0.01). Tuning width using BW30 gave similar 
results with significantly sharper tuning in A1 than AL (p < 0.05, Wilcoxon 
rank sum test). Means are plotted and standard errors are represented by 
bars. 



the coo's overtones were outside the neuron's excitatory receptive 
field (RF). 

The classification of neurons was based on each neuron's RF, 
and this was limited to neurons (N = 24) whose lower PT fre- 
quency cutoff fell within the range of frequencies we used in 



this paradigm (196-4435 Hz, see Methods). The neurons were 
classified into two groups: (1) frequency- representative neurons 
that responded to coo stimuli when the overtone harmonics fell 
into the neuron's RF, even though the F0 of the coo was out- 
side the RF; (2) pitch-selective neurons that responded when 
the F0 of coo stimuli fell into the RF but did not respond to 
coo stimuli when the overtone harmonics fell into the neuron's 
RF. Other types of neurons showed various kinds of responses 
that deviated from the above two groups; these neurons were 
categorized as "non-classified" (n = 64). Figure 6A illustrates an 
example of a neuron in Al that showed a frequency- representative 
response (Unit A). This neuron had a single peak (Figure 6B), 
and its BF was 2218 Hz (C#7). Since the overtone harmonics 
(h2-h6, Figure 1) fell into the neuron's RF even when the F0 
of the coo was outside the RF, the tuning curve in response to 
pitch-shifted coo calls was broader than that in response to PTs 
(Figure 6B). The onset latency to the BF was 52 ms, whereas the 
latency in response to the coo (whose F0 matched the BF) was 
46 ms, with no difference in mean firing rate to the two stim- 
uli (PT, 62.2 ± 20.7spikes/s; coo, 73.2 ± 19.9 spikes/s; p = 0.13, 
Wilcoxon ranksum test, Figure 6C). Of the 88 neurons tested, 21 
(24%) were of this type, and there was no difference in propor- 
tion between Al and LB (8 vs. 13 neurons, respectively, x. 2 = 2.1, 
df = Up = 0.15). Although the number is small (N = 3), there 
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FIGURE 5 | Response of multi-peaked neurons. (A) Raster plots of 
single-unit activity in an example neuron from area ML with multi-peaked 
tuning (bin width 1 ms). Plots are aligned to sound onset (vertical red line) 
in response to 84 tones presented in semitone-steps between 110 Hz (A2) 
and 13.3 kHz (G#9). Sound offset is indicated by vertical black line at far 
right. Y-axis indicates frequency (Hz) of pure-tone stimuli on a log scale. 
The neuron responded to four distinct frequency bands (440, 622, 932, 
and 1865 Hz), which are indicated by arrows. (B) Rate tuning curve of the 
multi-peaked neuron shown in (A) based on peak response magnitude 
during the entire duration of the sound (normalized by subtracting baseline 
activity and 5-point smoothing). Four peaks above the half driven rate 
(defined as 50% of the highest normalized peak firing rate, here indicated 
as a dotted line) were detected, and we chose the three highest peaks as 
best frequencies (BF1, BF2, BF3) to analyze the interval relations of 
multiple peaks. In this neuron, the frequency interval ratios of the three 



peaks were 7, 12, and 19 semitones, which correspond to 1.5 (BF2/BF1), 
2.0 (BF3/BF2), and 3.0 (BF3/BF1), ail of which are harmonically related (i.e., 
perfect harmonic or musical intervals): P5 (perfect fifth), 7 semitones apart; 
P8 (perfect eighth, or octave), 12 semitones apart, and another P5 (perfect 
fifth), 19 semitones apart. (C) Distribution of peak distance in multi-peaked 
neurons of A1 and LB. The distribution of BF ratios was calculated based 
on the peak firing rates during the early-response period (0-70 ms from 
sound onset) and during the late-response period (71-300 ms from sound 
onset). The interval distance between two peaks was estimated based on 
frequency interval ratio (BF ratio, x-axis), and the relative frequency (i.e., 
number of intervals relative to the total number of intervals in each 
subdivision) is shown on the y-axis. The confidence interval (CI) at 99.5% 
is indicated by a dashed line, and at 99.9%, it is indicated by a dotted line. 
BF ratios above 5.0 are not shown in the figure for display purposes; 
however, none of those peaks reached the CI threshold of 99.5%. 



were neurons that exhibited similar tuning in response to PTs and 
pitch-shifted coos with a shorter latency to the coo than to the PT- 
BF, and an enhanced response to the coo relative to the response 
to the PT-BF (Supplementary Figure 1). 

AVERAGE RESPONSE LATENCIES TO PURE TONES AT THE BF AND TO 
F0-MATCHED COMPLEX TONES ("C00" CALLS) 

The gradual increase in minimum latency from Al to ML, and 
from ML to AL in response to both PT and PNB (without pitch), 
as shown earlier in Figure 3C, suggests that this auditory infor- 
mation is processed hierarchically along these three subdivisions. 
Furthermore, the presence of harmonically- related interval ratios 
between peaks of multi-peaked neurons in their onset responses 
in LB but not in Al (Figure 5C) suggests that harmonic process- 
ing occurs initially and preferentially in LB rather than Al. As one 



might predict from this hierarchy of PT processing (Figure 3C), 
the average BF latency was also longer in LB than in Al (LB: 
median: 59 ms, Ql = 42 ms, Q3 = 90 ms, N = 93; Al: 38 ms, 
Ql = 27 ms, Q3 = 50 ms, N = 75; p < 10~ 5 , Wilcoxon rank- 
sum test, Figure 7). By contrast, the latency to coos that were 
FO-matched to the BFs did not differ between LB and Al. This 
was due to the response latencies in LB to coos being significantly 
shorter than the response latencies to PTs at the BF (coo: median: 
43 ms, Ql = 28 ms, Q3 = 62 ms, N = 36; PT: 59 ms, Ql = 42 ms, 
Q3 = 90ms,N = 93; p < 0.01). 

The above analysis restricts latency calculation to BF and the 
corresponding FO-matched coo. If minimum coo latency is cal- 
culated instead (i.e., the shortest latency of auditory responses 
elicited by all effective coos), similar results are obtained. Again, 
minimum coo latencies showed no difference between Al and LB. 
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FIGURE 6 | Example of a neuron in A1 showing 
frequency-representative responses to pure tones and pitch-shifted 
coos. (A) Spike rasters (upper half of each graph) and PSTHs (lower half) 
aligned to sound onset (red dotted line) in response to pure tones (left 
column) and pitch-shifted coos (right column) at the same pitch 
(196-4435 Hz at half-octave steps). The blue line marks sound offset. (B) 
Tuning curves for Unit A, shown in (A). The tuning curves resulting from 
peak responses to PTs and to the pitch-shifted coo, which is based on the 
fundamental frequency (FO) of the coos. Black horizontal line indicates 
half-driven rate. This type of neuron continued to show high firing rates 
when the overtone harmonics of the pitch-shifted coo fell into the neuron's 
RF, even when the FO of the coo was outside the RF. (C) Averaged PSTH of 
the responses of Unit A to its BF (solid blue line) and to a coo (solid red line) 
whose FO was matched to the BF The latency in response to the BF-PT 
was 46 ms, whereas the latency in response to the coo was 52 ms. Sound 
duration period is shown in pink; sound onset is indicated by the vertical 
edge at time zero. 



In monkey H, the respective values were 40 vs. 43 ms (median, 
Ql:23 vs. 25 ms; Q3:57 vs. 61 ms, p = 0.80); in monkey P, they 
were 39 vs. 47ms (median, Ql: 31 vs. 34ms, Q3: 53 vs. 58ms, 
p = 0.42). By contrast, the minimum response latency to PT 
increased significantly from Al to LB (monkey H: median: 30 
vs. 43 ms, Ql: 17 vs. 27 ms, Q3: 42 vs. 79 ms, p < 10~ 4 ; monkey 
P: median, 27 vs. 32 ms, Ql: 16 vs. 19 ms, Q3: 38 vs. 49 ms, p < 
0.01). Corresponding latency data from single-unit recordings are 
displayed separately for the three subdivisions in Figure 3C. 

DISCUSSION 

We recorded single-unit activity from auditory core cortex (Al) 
and from the middle and anterior divisions of the lateral belt 
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FIGURE 7 | Average response to the pure-tone BF and to coo calls with 
F0 matched to the BF. (A) PSTH of the responses in A1 (red) and LB (blue) 
averaged separately across all auditory neurons to their BF (left panel) and 
to a coo (right panel) whose F0 was matched to the BF The similarly 
color-coded thin lines show the standard deviations from the average 
response. The dotted vertical line indicates sound onset. Latencies to the 
BF in LB were longer than those to a coo (median ± SE, monkey H: 
median: 39 vs. 60 ms, Q1 : 29 vs. 43 ms, Q3: 48 vs. 95 ms, p < 0.001 ; 
monkey P: median: 38 vs. 56 ms, Q1 : 27 vs. 34 ms, Q3: 56 vs. 80 ms, 
p < 0.05). By contrast, there were no latency differences between A1 and 
LB in response to a coo (monkey H: median: 35 vs. 33 ms, Ql : 11 vs. 
25 ms, Q3: 58 vs. 59 ms, p = 0.55; monkey P: median: 43 vs. 46 ms, Q1 : 
37 vs. 43 ms, Q3: 59 vs. 63 ms, p = 0.38). (B) Neural latencies to the BF in 
response to pure tones (PT) and to the coo with its F0 matched to the 
neuron's BF Unlike the latencies observed in response to PTs and PNBs 
(Figure 3), there was no difference between A1 and LB latency in response 
to coo. Furthermore, the latency to coo was significantly shorter than the 
latency to the BF in LB at the population level (two animals: 43 ms vs. 
59 ms, p < 0.01 ). The abbreviations used for the box and whisker plots are 
the same as in Figure 3. *p < 0.05, **p < 0.01 , ***p < 0.001 . 



(LB) in response to pure tones and natural coo calls in two rhesus 
monkeys while they performed an auditory discrimination task. 
There were three major findings: (1) Latencies to pure-tone and 
pink-noise stimuli were significantly longer in LB than in Al; (2) 
responses to natural coo calls, which consist of complex harmonic 
tones with a defined fundamental frequency (F0), were observed 
with essentially equal latencies in LB and Al; together with find- 
ing 1, this suggests neuronal facilitation by communication calls 
with harmonic structures; and (3) although multi-peaked neu- 
rons were found in all three divisions, peak intervals in LB 
showed harmonic relationships in both early and late responses, 
whereas harmonic peak intervals in Al were only found in mod- 
est numbers and only in late responses. These findings suggest 
that LB neurons play a critical role in the processing of auditory 
harmonics in animal communication calls. 

LATENCY DIFFERENCES BETWEEN A1 AND LB 

The gradual increase of pure-tone latencies from Al to LB (Al, 
28 ms; ML, 35 ms; AL, 44 ms; Figure 3C) is comparable to that 
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observed in other studies of macaques (Recanzone et al, 2000; 
Camalier et al, 2012). We observed a similar latency increase from 
Al to LB in response to pink noise bursts (PNB) (Al, 36 ms; ML, 
39 ms; AL, 51ms; Figure 3C). On the other hand, Lakatos et al. 
(2005) showed that the latency to a noise stimulus was reduced 
in belt regions compared to Al. However, their recording sites 
appeared to be in the posterior medial belt, whereas ours were in 
the lateral belt. More importantly, that study used narrow-band 
noise (NBN) stimuli, which elicit a pitch percept, as opposed to 
PNB stimuli that have no pitch. Thus, the reduced latencies to 
NBN in Lakatos' study correspond more closely to the relative 
latency reduction in response to (harmonic) coo stimuli reported 
here. 

Since recent studies have reported that neuronal activity in the 
auditory cortex differ depending on task context or task demands 
(Scott et al, 2007; Sutter and Shamma, 2011; Niwa et al, 2012), 
neural latencies to S+ were analyzed separately for correct and 
incorrect trials. Although our results did not show a significant 
difference between the two conditions, it may be of interest to 
address this question more systematically in the future. This will 
require a more balanced design, since the number of error trials 
was very small (1.8%) in the present study. 

Absolute latencies were longer overall in our study than in 
previous studies. One of the main reasons for this may be the 
use of raw data (in 1-ms bins) during the baseline period, which 
causes higher variability of baseline firing rates than does using 
Gaussian-smoothed data (see Methods). Furthermore, in our 
study, the variability of baseline firing rate across all stimuli was 
taken into account. Shorter latencies are generally observed in 
studies measuring multi-unit activity and current-source density 
responses, because neural latencies can be more clearly identified 
from such signals (Lakatos et al., 2005). 

MULTI-PEAKED NEURONS AND HARMONIC INTERVALS 

Multi-peaked neurons tuned to harmonically-related intervals 
have been reported in the primary auditory cortex of several 
species, including bats (Suga et al., 1979), marmosets (Kadia 
and Wang, 2003), and cats (Oonishi and Katsuki, 1965; Sutter 
and Schreiner, 1991; Eggermont, 2007; Norefia et al., 2008). 
Specifically, octave and perfect-fifth coding has been reported 
in Al of cats (perfect fifth: Sutter and Schreiner, 1991) and 
marmosets (octave: Kadia and Wang, 2003). While all stud- 
ies agree that spectral integration begins already at an early 
stage of auditory cortical processing, our study demonstrates 
that the number of neurons with harmonically-related intervals 
between best-frequency peaks increases significantly from Al to 
LB (Figure 5). Furthermore, while we found multi-peaked neu- 
rons with harmonic intervals in both Al and LB, there was a 
clear difference between the two regions in terms of response 
type: The distribution of peak distances in LB had a max- 
imum at the perfect fifth for both early (<70ms) and late 
response components (>70ms) and a peak at one octave for 
late response components. By contrast, in Al only a peak at 
the perfect fifth was found, and only for late response com- 
ponents (Figure 5C). Different (preferred) harmonic intervals 
were reported in Al of cats (perfect fifth: Sutter and Schreiner, 
1991), marmosets (octave: Kadia and Wang, 2003), and humans 



(Moerel et al., 2013) and it would be interesting to perform a 
cross-species comparison of preferred harmonic intervals in mul- 
tipeaked responses as well as their cortical distribution in the 
future. 

The relatively small amount of harmonic tuning observed in 
the early responses of Al neurons suggests the possibility that LB 
is the first stage of convergence of inputs creating harmonic tun- 
ing, and that Al neurons may reflect harmonic tuning mainly 
via feedback from higher-order regions like LB. Alternatively, 
it is possible that LB receives direct thalamic inputs that inte- 
grate over a broad frequency range at regular frequency intervals, 
or that inhibitory intracortical inputs play a role in sculpt- 
ing the harmonically-related intervals. Taking all the evidence 
together, it seems most likely that convergent cortical projections 
from Al create harmonically tuned cells in LB. This mechanism 
is commonly referred to as spectral "combination sensitivity" 
(Suga et al., 1979; Margoliash and Fortune, 1992; Rauschecker 
et al, 1995). The overall narrower tuning in Al compared to LB 
observed in our study is consistent with this conclusion and is also 
supported by previous findings of others (Schroeder et al, 2001; 
Fu et al, 2004; Lakatos et al., 2005). 

In one behavioral study, Izumi (2000) showed that Japanese 
macaques are poor at discriminating a single tone from simul- 
taneously presented two-tone stimuli separated by either one 
octave or by a perfect fifth that share the same pitch. This sug- 
gests that the monkey makes use of perceptual grouping based 
on harmonically-related tones. Correspondingly, in another study 
(Kadia and Wang, 2003), response modulation was observed 
when sounds were presented outside the classical RFs of Al in 
awake marmosets. Using a two-tone paradigm, these authors 
found that frequency-tuning peaks in multi-peaked neurons were 
often harmonically related, and they observed response facilita- 
tion when such harmonically related pairs of tones were presented 
simultaneously. Similar effects have been reported by other stud- 
ies in Al (Fitzpatrick et al., 1993; Brosch and Schreiner, 1997; 
Brosch et al, 1999; Kanwal et al., 1999), further supporting mech- 
anisms of combination sensitivity. Since we did not employ a 
two-tone paradigm, direct response facilitation (increased firing 
rates) by a combination of tones was not examined here in either 
Al or LB. Further studies will also be needed to examine whether 
neurons in Al or LB are in fact more sensitive to consonant 
than to dissonant sound structures of a complex tone, since a 
recent study highlighted responses in primary auditory cortex to 
nonharmonic sounds (Fishman and Steinschneider, 2010). 

RESPONSES TO COMPLEX TONAL "C00" CALLS 

The average response to a PT at the best frequency (BF) and to a 
pitch-shifted coo at the same frequency also revealed that laten- 
cies to PTs were significantly shorter in Al than in LB, whereas 
the response of LB neurons caused by adding higher harmonics 
to a fundamental frequency resulted in essentially equal laten- 
cies to natural coo calls in Al and LB (Figure 7), a finding that 
may seem surprising given the standard view of hierarchical corti- 
cal processing. This finding further underscores that convergence 
of inputs in LB results in facilitation of responses to complex 
harmonic tones, as LB neurons generally prefer complex sounds 
over PTs (Rauschecker et al, 1995). Alternatively, responses to 
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PTs and coos could depend on input from different divisions of 
the medial geniculate nucleus (MGN) with differential frequency 
tuning and latency (Hackett, 2011). Indeed, more multi-peaked 
neurons are found in the dorsal than in the ventral part of the 
MGN (Bartlett and Wang, 2011). We showed two possible neu- 
ron types that may contribute to the serial and parallel processing 
in Al and LB (Figure 6 and Supplementary Figure 1). However, 
this classification was not able to cover all the neurons recorded 
in the PT and pitch-shifted coo paradigm because of the con- 
straints on BF frequency ranges (see Methods). The relationships 
between the frequency tuning of the neurons and response latency 
(Figures 6, 7) remain unclear; specifically, we found only three 
neurons showing similar tuning to pitch-shifted coos and PTs 
(Supplementary Figure 1), and this needs to be addressed in 
further studies. 

In sum, the findings of this study demonstrate that a purely 
serial model of cortical processing may be insufficient. On the 
other hand, the principles of hierarchical convergence and com- 
bination sensitivity in auditory processing (Rauschecker, 1998; 
DeWitt and Rauschecker, 2012) still stand. The latency reduc- 
tion to harmonically-structured conspecific vocalizations and the 
existence of neurons tuned to simple frequency interval ratios in 
monkey nonprimary auditory cortex could be evidence of effi- 
cient information processing for ethologically relevant sounds. 
Harmonics are among the essential acoustic structures observed 
in natural acoustic environments that are generally limited to 
species-specific vocalizations (including human speech), which 
are the main sounds of biological interest for most species. In 
this study we employed a natural vocalization instead of synthetic 
stimuli to maximize our chances of eliciting neural responses, 
based on the evidence that neurons in the anterolateral belt area 
(AL) are more responsive to species-specific vocalizations (Tian 
et al., 2001). Since the previous study treated various harmonic 
and nonharmonic vocalizations as one category ("monkey calls") 
and the F0 of the harmonic vocalizations was not varied, in this 
study we controlled the pitch and harmonic structure of mon- 
key vocalizations by using a coo call, one of the most frequently 
heard vocalizations in both field and lab environments. Although 
the coo call has ethological meaning for the animals used in this 
study, we cannot determine from our results whether LB neu- 
rons respond to the harmonic structure of the calls, or whether 
they respond instead to complex acoustic features that might 
relate to their ecological relevance. Identification of the cortical 
areas that are involved in the transition from processing com- 
plex acoustic features (i.e., pitch and harmonicities) to processing 
natural conspecific calls is an important question. This issue is 
highlighted in a recent study by Fukushima et al. (2014) using 
microelectrocorticography in awake macaques: the classification 
of vocalizations was better than that for synthetic stimuli as the 
recording sites moved from caudal to rostral within the audi- 
tory ventral stream. Further studies will be needed to address this 
point at different neurological scales, including the single-unit 
level. Also, it would be of interest to learn more about the under- 
lying neuronal mechanisms of harmonic preference observed 
at the behavioral level (Schellenberg and Trainor, 1996; Izumi, 
2000) and when this important evolutionary development first 
occurred. 
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